Occlusion-Aware Object Localization, Segmentation and Pose Estimation
نویسندگان
چکیده
We present a learning approach for localization and segmentation of objects in an image in a manner that is robust to partial occlusion. Our algorithm, Segmentation and Detection using Higher-Order Potentials (SDHOP) produces a bounding box around the full extent of the object and labels pixels in its interior that belong to the object. This is different from semantic segmentation, which does not provide information about the spatial position of labelled pixels inside the object. A common theme in the literature is to model occlusion geometrically or appearance-wise, thereby allowing it to contribute to the detection process. The former often make simplifying assumptions about occluder and scene geometry. Our appearance-based approach avoids these assumptions and performs better than existing appearance-based approaches due to the use of higher-order potentials for modelling neighbour influence and a loss function that targets both localization and segmentation. SD-HOP discriminatively learns HOG templates for objects and occlusion. Whereas the object templates model the objects of interest, the occlusion templates provide discriminative support and do not model a specific occluder. Segmentation is done by considering the response of patches to these templates, and influence of neighbouring patches through a CRF with higher-order connections. The training phase requires a set of images with different occlusions of the object(s) of interest. Each training sample is (1) over-segmented and (2) annotated with a bounding box around the full extent of the object and a binary segmentation of the area inside the box into object vs. non-object pixels. Given these, we train a structured Support Vector Machine (SVM) that learns the HOG templates and CRF weights. Object segmentation is done by assigning binary labels to HOG cells within the bounding box, 1 for visible and 0 for occluded. Neighbour influence for segmentation can take two forms: (1) pairwise terms that impose a cost for 4-connected neighbours to have different labels and (2) higher-order potentials that impose a cost for cells to have a different label than the dominant label in their segment of the image. These segments are produced separately by an unsupervised segmentation algorithm. The label for an object in an image x is represented as y = (p,v,a), where p is the bounding box, v is a vector of binary variables indicating the visibility of HOG cells within p and a ∈ [1,A] indexes the discrete viewpoint. p = (px, py, pσ ) indicates the position of the top left corner and the level in a scale-space pyramid. The width and height of the box are fixed per viewpoint as wa and ha HOG cells respectively. Hence v has wa ·ha elements. Given a labelled image, a sparse joint feature vector Ψ(x,y) is formed by stacking A vectors, each corresponding to a different discretized viewpoint. These vectors consist of vectorized HOG features and visibility labels of cells, count of cells in p that lie outside the image boundary, statistics of visibility agreement between 4-connected neighbouring cells and cells in the same unsupervised segment, and a constant bias. All vectors except for the one corresponding to viewpoint a are zeroed out. Learning involves determining linear weights w such that the score Figure 2: 3D pose estimation. Left to right: Pose estimation with IRLS, SD-HOP refined segmentation, Pose estimation with OR-IRLS.
منابع مشابه
Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning
Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images Using Deep Learning
متن کاملOcclusion robust pose and model estimation using Gaussian Process Latent Variable Model on GPU
In this project, we would like to develop multi-object pose and model estimation which uses the 3D voxelized model and foreground background segmentation to detect 3D pose and model variation. In the relevant literature, [4], the authors used gradient descent to minimize the optimization energy function which is the integration over image multiplied by the smoothed step function of SDF of 3D mo...
متن کاملObject Detection and Segmentation using Discriminative Learning
Jingdan Zhang: Object Detection and Segmentation using Discriminative Learning. (Under the direction of Leonard McMillan.) Object detection and segmentation algorithms need to use prior knowledge of objects’ shape and appearance to guide solutions to correct ones. A promising way of obtaining prior knowledge is to learn it directly from expert annotations by using machine learning techniques. P...
متن کاملThe Best of Both Worlds: Learning Geometry-based 6D Object Pose Estimation
We address the task of estimating the 6D pose of known rigid objects, from RGB and RGB-D input images, in scenarios where the objects are heavily occluded. Our main contribution is a new modular processing pipeline. The first module localizes all known objects in the image via an existing instance segmentation network. The next module densely regresses the object surface positions in its local ...
متن کاملLatent-Class Hough Forests for 3D Object Detection and Pose Estimation
In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and occluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. In training, rather than explici...
متن کامل